library(tidyverse)
library(janitor)
library(gt)
library(readxl)
In this R Markdown file, the Excel file that is read in is called analytic_data.xlxs. The data frame is called EXAMPLE_DATA. Replace these with the names of the files you wish to use.
EXAMPLE_DATA <- read_excel("analytic_data.xlsx")
EXAMPLE_DATA <- EXAMPLE_DATA %>%
mutate_if(is.character,as.factor)
In all of the code below, you will need to replace EXAMPLE_DATA with the name of your data frame. You will need to use the appropriate variable names.
EXAMPLE_DATA %>%
summarise(Mean = mean(NUMERICAL_VARIABLE1),
"Standard Deviation" = sd(NUMERICAL_VARIABLE1),
n = n()) %>%
gt() %>%
fmt_number(c(Mean, "Standard Deviation"),
decimals = 1) %>%
tab_spanner(
label = "Nice label for NUMERICAL_VARIABLE1",
columns = c(1:3))
| Nice label for NUMERICAL_VARIABLE1 | ||
|---|---|---|
| Mean | Standard Deviation | n |
| 4.3 | 2.3 | 20 |
Mean, SD and n
EXAMPLE_DATA %>%
group_by(CATEGORICAL_VARIABLE1) %>%
summarise(Mean = mean(NUMERICAL_VARIABLE1),
"Standard Deviation" = sd(NUMERICAL_VARIABLE1),
n = n()) %>%
gt() %>%
fmt_number(c(Mean, "Standard Deviation"),
decimals = 1)
| CATEGORICAL_VARIABLE1 | Mean | Standard Deviation | n |
|---|---|---|---|
| A | 4.0 | 1.4 | 10 |
| B | 4.7 | 2.9 | 10 |
Five number summary, including median and quartiles
EXAMPLE_DATA %>%
group_by(CATEGORICAL_VARIABLE1) %>%
summarise(n = n(),
Minimum = min(NUMERICAL_VARIABLE1),
"First quartile" = quantile(NUMERICAL_VARIABLE1, 0.25),
Median = median(NUMERICAL_VARIABLE1),
"Third quartile" = quantile(NUMERICAL_VARIABLE1, 0.75),
Maximum = max(NUMERICAL_VARIABLE1))%>%
gt() %>%
fmt_number(c(Minimum, "First quartile", Median, "Third quartile", Maximum),
decimals = 1)
| CATEGORICAL_VARIABLE1 | n | Minimum | First quartile | Median | Third quartile | Maximum |
|---|---|---|---|---|---|---|
| A | 10 | 2.5 | 2.8 | 3.8 | 4.6 | 7.0 |
| B | 10 | 1.5 | 2.7 | 4.1 | 5.8 | 10.3 |
EXAMPLE_DATA %>%
tabyl(CATEGORICAL_VARIABLE1) %>%
gt() %>%
cols_label(percent = "proportion")
| CATEGORICAL_VARIABLE1 | n | proportion |
|---|---|---|
| A | 10 | 0.5 |
| B | 10 | 0.5 |
There are additional adorn functions that can be used to
improve the gt tables further.
EXAMPLE_DATA %>%
tabyl(CATEGORICAL_VARIABLE1) %>%
adorn_totals("row") %>%
adorn_pct_formatting() %>%
gt()
| CATEGORICAL_VARIABLE1 | n | percent |
|---|---|---|
| A | 10 | 50.0% |
| B | 10 | 50.0% |
| Total | 20 | 100.0% |
EXAMPLE_DATA %>%
tabyl(CATEGORICAL_VARIABLE1, CATEGORICAL_VARIABLE2) %>%
gt()
| CATEGORICAL_VARIABLE1 | N | Y |
|---|---|---|
| A | 2 | 8 |
| B | 7 | 3 |
An improvement adds the label of the second categorical variable so it is clear what the table is representing.
EXAMPLE_DATA %>%
tabyl(CATEGORICAL_VARIABLE1, CATEGORICAL_VARIABLE2) %>%
gt() %>%
tab_spanner(
label = "CATEGORICAL_VARIABLE2",
columns = c(N,Y))
| CATEGORICAL_VARIABLE1 | CATEGORICAL_VARIABLE2 | |
|---|---|---|
| N | Y | |
| A | 2 | 8 |
| B | 7 | 3 |
You can then use adorn to add in percentages.
EXAMPLE_DATA %>%
tabyl(CATEGORICAL_VARIABLE1, CATEGORICAL_VARIABLE2) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front") %>%
gt() %>%
tab_spanner(
label = "CATEGORICAL_VARIABLE2",
columns = c(N,Y))
| CATEGORICAL_VARIABLE1 | CATEGORICAL_VARIABLE2 | |
|---|---|---|
| N | Y | |
| A | 2 (20%) | 8 (80%) |
| B | 7 (70%) | 3 (30%) |
© Statistical Consulting Centre, University of Melbourne, 2023